---
title: Storage configuration
description: Learn how to configure Delta Lake on different storage systems.
---

import { Tabs, TabItem, Aside } from "@astrojs/starlight/components";

Delta Lake ACID guarantees are predicated on the atomicity and durability guarantees of the storage system. Specifically, Delta Lake relies on the following when interacting with storage systems:

- **Atomic visibility**: There must a way for a file to visible in its entirety or not visible at all.
- **Mutual exclusion**: Only one writer must be able to create (or rename) a file at the final destination.
- **Consistent listing**: Once a file has been written in a directory, all future listings for that directory must return that file.

Because storage systems do not necessarily provide all of these guarantees out-of-the-box, Delta Lake transactional operations typically go through the [LogStore API](https://github.com/delta-io/delta/blob/master/storage/src/main/java/io/delta/storage/LogStore.java) instead of accessing the storage system directly. To provide the ACID guarantees for different storage systems, you may have to use different `LogStore` implementations. This article covers how to configure Delta Lake for various storage systems. There are two categories of storage systems:

- **Storage systems with built-in support**: For some storage systems, you do not need additional configurations. Delta Lake uses the scheme of the path (that is, `s3a` in `s3a://path`) to dynamically identify the storage system and use the corresponding `LogStore` implementation that provides the transactional guarantees. However, for S3, there are additional caveats on concurrent writes. See the [section on S3](#amazon-s3) for details.

- **Other storage systems**: The `LogStore`, similar to Apache Spark, uses Hadoop `FileSystem` API to perform reads and writes. So Delta Lake supports concurrent reads on any storage system that provides an implementation of `FileSystem` API. For concurrent writes with transactional guarantees, there are two cases based on the guarantees provided by `FileSystem` implementation. If the implementation provides consistent listing and atomic renames-without-overwrite (that is, `rename(..., overwrite = false)` will either generate the target file atomically or fail if it already exists with `java.nio.file.FileAlreadyExistsException`), then the default `LogStore` implementation using renames will allow concurrent writes with guarantees. Otherwise, you must configure a custom implementation of `LogStore` by setting the following Spark configuration:

```ini
spark.delta.logStore.<scheme>.impl=<full-qualified-class-name>
```

where `<scheme>` is the scheme of the paths of your storage system. This configures Delta Lake to dynamically use the given `LogStore` implementation only for those paths. You can have multiple such configurations for different schemes in your application, thus allowing it to simultaneously read and write from different storage systems.

<Aside type="note">
  - Delta Lake on local file system may not support concurrent transactional
  writes. This is because the local file system may or may not provide atomic
  renames. So you should not use the local file system for testing concurrent
  writes. - Before version 1.0, Delta Lake supported configuring LogStores by
  setting `spark.delta.logStore.class`. This approach is now deprecated. Setting
  this configuration will use the configured `LogStore` for all paths, thereby
  disabling the dynamic scheme-based delegation.
</Aside>

## Amazon S3

Delta Lake supports reads and writes to S3 in two different modes: Single-cluster and Multi-cluster.

|  | Single-cluster | Multi-cluster |
| --- | --- | --- |
| Configuration | Comes with Delta Lake out-of-the-box | Is experimental and requires extra configuration |
| Reads | Supports concurrent reads from multiple clusters | Supports concurrent reads from multiple clusters |
| Writes | Supports concurrent writes from a _single_ Spark driver | Supports multi-cluster writes |
| Permissions | S3 credentials | S3 and DynamoDB operating permissions |

### Single-cluster setup (default)

In this default mode, Delta Lake supports concurrent reads from multiple clusters, but concurrent writes to S3 must originate from a _single_ Spark driver in order for Delta Lake to provide transactional guarantees. This is because S3 currently does not provide mutual exclusion, that is, there is no way to ensure that only one writer is able to create a file.

<Aside type="caution">
  Concurrent writes to the same Delta table on S3 storage from multiple Spark
  drivers can lead to data loss. For a multi-cluster solution, please see the
  [Multi-cluster setup](#multi-cluster-setup) section below.
</Aside>

#### Requirements (S3 single-cluster)

- S3 credentials: [IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) (recommended) or access keys
- Apache Spark associated with the corresponding Delta Lake version.
- Hadoop's [AWS connector (hadoop-aws)](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/) for the version of Hadoop that Apache Spark is compiled for.

#### Quickstart (S3 single-cluster)

This section explains how to quickly start reading and writing Delta tables on S3 using single-cluster mode. For a detailed explanation of the configuration, see [Setup Configuration (S3 multi-cluster)](#setup-configuration-s3-multi-cluster).

1. Use the following command to launch a Spark shell with Delta Lake and S3 support (assuming you use Spark 4.0.0 which is pre-built for Hadoop 3.4.0):

<Tabs>
  <TabItem label="Bash">

    ```bash
    bin/spark-shell \
     --packages io.delta:delta-spark_2.13:4.0.0,org.apache.hadoop:hadoop-aws:3.4.0 \
     --conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
     --conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key>
    ```

  </TabItem>
</Tabs>

2. Try out some basic Delta table operations on S3 (in Scala):

<Tabs>
  <TabItem label="Scala">
    ```scala
    // Create a Delta table on S3:
    spark.range(5).write.format("delta").save("s3a://<your-s3-bucket>/<path-to-delta-table>")

    // Read a Delta table on S3:
    spark.read.format("delta").load("s3a://<your-s3-bucket>/<path-to-delta-table>").show()
    ```

  </TabItem>
</Tabs>

For other languages and more examples of Delta table operations, see the [Quickstart](/quick-start/) page.

For efficient listing of Delta Lake metadata files on S3, set the configuration `delta.enableFastS3AListFrom=true`. This performance optimization is in experimental support mode. It will only work on `S3A` filesystems and will not work on [Amazon's EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-file-systems.html) default filesystem `S3`.

<Tabs>
  <TabItem label="Scala">
    ```scala
    bin/spark-shell \
      --packages io.delta:delta-spark_2.13:4.0.0,org.apache.hadoop:hadoop-aws:3.4.0 \
      --conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
      --conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key> \
      --conf "spark.hadoop.delta.enableFastS3AListFrom=true
    ```
  </TabItem>
</Tabs>

### Multi-cluster setup

<Aside type="note">This support is new and experimental.</Aside>

This mode supports concurrent writes to S3 from multiple clusters and has to be explicitly enabled by configuring Delta Lake to use the right `LogStore` implementation. This implementation uses [DynamoDB](https://aws.amazon.com/dynamodb/) to provide the mutual exclusion that S3 is lacking.

<Aside type="caution">
  This multi-cluster writing solution is only safe when all writers use this
  `LogStore` implementation as well as the same DynamoDB table and region. If
  some drivers use out-of-the-box Delta Lake while others use this experimental
  `LogStore`, then data loss can occur.
</Aside>

#### Requirements (S3 multi-cluster)

- All of the requirements listed in [Requirements (S3 single-cluster)](#requirements-s3-single-cluster) section
- In additon to S3 credentials, you also need DynamoDB operating permissions

#### Quickstart (S3 multi-cluster)

This section explains how to quickly start reading and writing Delta tables on S3 using multi-cluster mode.

1. Use the following command to launch a Spark shell with Delta Lake and S3 support (assuming you use Spark 4.0.0 which is pre-built for Hadoop 3.4.0):

<Tabs>
  <TabItem label="Bash">
    ```bash
    bin/spark-shell \
     --packages io.delta:delta-spark_2.13:3,org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-storage-s3-dynamodb:4.0.0 \
     --conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
     --conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key> \
     --conf spark.delta.logStore.s3a.impl=io.delta.storage.S3DynamoDBLogStore \
     --conf spark.io.delta.storage.S3DynamoDBLogStore.ddb.region=us-west-2
    ```
  </TabItem>
</Tabs>

2. Try out some basic Delta table operations on S3 (in Scala):

<Tabs>
  <TabItem label="Scala">
    ```scala
    // Create a Delta table on S3:
    spark.range(5).write.format("delta").save("s3a://<your-s3-bucket>/<path-to-delta-table>")

    // Read a Delta table on S3:
    spark.read.format("delta").load("s3a://<your-s3-bucket>/<path-to-delta-table>").show()
    ```

  </TabItem>
</Tabs>

For other languages and more examples of Delta table operations, see the [Quickstart](/quick-start/) page.

#### Setup Configuration (S3 multi-cluster)

1. Create the DynamoDB table.

   You have the choice of creating the DynamoDB table yourself (recommended) or having it created for you automatically.

   - Creating the DynamoDB table yourself

     This DynamoDB table will maintain commit metadata for multiple Delta tables, and it is important that it is configured with the [Read/Write Capacity Mode](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) (for example, on-demand or provisioned) that is right for your use cases. As such, we strongly recommend that you create your DynamoDB table yourself. The following example uses the AWS CLI. To learn more, see the [create-table](https://docs.aws.amazon.com/cli/latest/reference/dynamodb/create-table.html) command reference.

<Tabs>
  <TabItem label="Bash">
    ```bash 
    aws dynamodb create-table \
      --region us-east-1 \
      --table-name delta_log \
      --attribute-definitions AttributeName=tablePath,AttributeType=S \
      AttributeName=fileName,AttributeType=S \
      --key-schema AttributeName=tablePath,KeyType=HASH \
      AttributeName=fileName,KeyType=RANGE \
      --billing-mode PAY_PER_REQUEST 
    ```
  </TabItem>
</Tabs>

2. Follow the configuration steps listed in [Configuration (S3 single-cluster)](#configuration-s3-single-cluster) section.

3. Include the `delta-storage-s3-dynamodb` JAR in the classpath.

4. Configure the `LogStore` implementation in your Spark session.

   First, configure this `LogStore` implementation for the scheme `s3`. You can replicate this command for schemes `s3a` and `s3n` as well.

<Tabs>
  <TabItem label="ini">
    ```ini 
    spark.delta.logStore.s3.impl=io.delta.storage.S3DynamoDBLogStore 
    ```
  </TabItem>
</Tabs>

Next, specify additional information necessary to instantiate the DynamoDB client. You must instantiate the DynamoDB client with the same `tableName` and `region` each Spark session for this multi-cluster mode to work correctly. A list of per-session configurations and their defaults is given below:

<Tabs>
  <TabItem label="ini">
    ```ini
    spark.io.delta.storage.S3DynamoDBLogStore.ddb.tableName=delta_log
    spark.io.delta.storage.S3DynamoDBLogStore.ddb.region=us-east-1
    spark.io.delta.storage.S3DynamoDBLogStore.credentials.provider=<AWSCredentialsProvider* used by the client>
    spark.io.delta.storage.S3DynamoDBLogStore.provisionedThroughput.rcu=5
    spark.io.delta.storage.S3DynamoDBLogStore.provisionedThroughput.wcu=5
    ```
  </TabItem>
</Tabs>

#### Production Configuration (S3 multi-cluster)

By this point, this multi-cluster setup is fully operational. However, there is extra configuration you may do to improve performance and optimize storage when running in production.

1. Adjust your Read and Write Capacity Mode.

   If you are using the default DynamoDB table created for you by this `LogStore` implementation, its default RCU and WCU might not be enough for your workloads. You can [adjust the provisioned throughput](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ProvisionedThroughput.html#ProvisionedThroughput.CapacityUnits.Modifying) or [update to On-Demand Mode](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.Basics.html#WorkingWithTables.Basics.UpdateTable).

2. Cleanup old DynamoDB entries using Time to Live (TTL).

   Once a DynamoDB metadata entry is marked as complete, and after sufficient time such that we can now rely on S3 alone to prevent accidental overwrites on its corresponding Delta file, it is safe to delete that entry from DynamoDB. The cheapest way to do this is using [DynamoDB's TTL](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html) feature which is a free, automated means to delete items from your DynamoDB table.

   Run the following command on your given DynamoDB table to enable TTL:

<Tabs>
  <TabItem label="Bash">
    ```bash 
    aws dynamodb update-time-to-live \
      --region us-east-1 \
      --table-name delta_log \
      --time-to-live-specification "Enabled=true, AttributeName=expireTime" 
    ```
  </TabItem>
</Tabs>

The default `expireTime` will be one day after the DynamoDB entry was marked as completed.

3. Cleanup old AWS S3 temp files using S3 Lifecycle Expiration.

   In this `LogStore` implementation, a temp file is created containing a copy of the metadata to be committed into the Delta log. Once that commit to the Delta log is complete, and after the corresponding DynamoDB entry has been removed, it is safe to delete this temp file. In practice, only the latest temp file will ever be used during recovery of a failed commit.

   Here are two simple options for deleting these temp files:

   1. Delete manually using S3 CLI.

      This is the safest option. The following command will delete all but the latest temp file in your given `<bucket>` and `<table>`:

<Tabs>
  <TabItem label="Bash">
    ```bash
    aws s3 ls s3://<bucket>/<delta_table_path>/_delta_log/.tmp/ --recursive | awk 'NF>1{print $4}' | grep . | sort | head -n -1  | while read -r line ; do
        echo "Removing ${line}"
        aws s3 rm s3://<bucket>/<delta_table_path>/_delta_log/.tmp/${line}
    done
    ```
  </TabItem>
</Tabs>

2.  Delete using an S3 Lifecycle Expiration Rule

    A more automated option is to use an [S3 Lifecycle Expiration rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html), with filter prefix pointing to the `<delta_table_path>/_delta_log/.tmp/` folder located in your table path, and an expiration value of 30 days.

  <Aside type="note">
  It is important that you choose a sufficiently large expiration value. As stated above, the latest temp file will be used during recovery of a failed commit. If this temp file is deleted, then your DynamoDB table and S3 `<delta_table_path>/_delta_log/.tmp/` folder will be out of sync.
  </Aside>

    There are a variety of ways to configuring a bucket lifecycle configuration, described in AWS docs [here](https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html).

    One way to do this is using S3's `put-bucket-lifecycle-configuration` command. See [S3 Lifecycle Configuration](https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-lifecycle-configuration.html) for details. An example rule and command invocation is given below:

    In a file referenced as `file://lifecycle.json`:

<Tabs>
  <TabItem label="JSON">
    ```json
    {
      "Rules":[
        {
          "ID":"expire_tmp_files",
          "Filter":{
            "Prefix":"path/to/table/_delta_log/.tmp/"
          },
          "Status":"Enabled",
          "Expiration":{
            "Days":30
          }
        }
      ]
    }
    ```
  </TabItem>
</Tabs>

<Tabs>
  <TabItem label="Bash">
    ```bash
    aws s3api put-bucket-lifecycle-configuration \
      --bucket my-bucket \
      --lifecycle-configuration file://lifecycle.json
    ```
  </TabItem>
</Tabs>

<Aside type="note">
  AWS S3 may have a limit on the number of rules per bucket. See
  [PutBucketLifecycleConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLifecycleConfiguration.html)
  for details.
</Aside>

## Microsoft Azure storage

Delta Lake has built-in support for the various Azure storage systems with full transactional guarantees for concurrent reads and writes from multiple clusters.

Delta Lake relies on Hadoop `FileSystem` APIs to access Azure storage services. Specifically, Delta Lake requires the implementation of `FileSystem.rename()` to be atomic, which is only supported in newer Hadoop versions ([Hadoop-15156](https://issues.apache.org/jira/browse/HADOOP-15156) and [Hadoop-15086](https://issues.apache.org/jira/browse/HADOOP-15086)). For this reason, you may need to build Spark with newer Hadoop versions and use them for deploying your application. See [Specifying the Hadoop Version and Enabling YARN](https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn) for building Spark with a specific Hadoop version and [Quickstart](/quick-start/) for setting up Spark with Delta Lake.

Here is a list of requirements specific to each type of Azure storage system:

### Azure Blob storage

#### Requirements (Azure Blob storage)

- A [shared key](https://docs.microsoft.com/rest/api/latest/storageservices/authorize-with-shared-key) or [shared access signature (SAS)](https://docs.microsoft.com/azure/storage/common/storage-dotnet-shared-access-signature-part-1)
- Delta Lake 0.2.0 or above
- Hadoop's Azure Blob Storage libraries for deployment with the following versions:
  - 2.9.1+ for Hadoop 2
  - 3.0.1+ for Hadoop 3
- Apache Spark associated with the corresponding Delta Lake version and [compiled with Hadoop version](https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn) that is compatible with the chosen Hadoop libraries.

For example, a possible combination that will work is Delta 0.7.0 or above, along with Apache Spark 3.0 compiled and deployed with Hadoop 3.2.

#### Configuration (Azure Blob storage)

Here are the steps to configure Delta Lake on Azure Blob storage.

1. Include `hadoop-azure` JAR in the classpath. See the requirements above for version details.

2. Set up credentials.

   You can set up your credentials in the [Spark configuration property](https://spark.apache.org/docs/latest/configuration.html).

   We recommend that you use a SAS token. In Scala, you can use the following:

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.conf.set(
      "fs.azure.sas.<your-container-name>.<your-storage-account-name>.blob.core.windows.net",
       "<complete-query-string-of-your-sas-for-the-container>")
    ```
  </TabItem>
</Tabs>

Or you can specify an account access key:

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.conf.set(
      "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
       "<your-storage-account-access-key>")
    ```
  </TabItem>
</Tabs>

#### Usage (Azure Blob storage)

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.range(5).write.format("delta").save("wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<path-to-delta-table>")
    spark.read.format("delta").load("wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<path-to-delta-table>").show()
    ```
  </TabItem>
</Tabs>

### Azure Data Lake Storage Gen1

#### Requirements (ADLS Gen1)

- A [service principal](https://docs.microsoft.com/azure/active-directory/develop/app-objects-and-service-principals) for OAuth 2.0 access
- Delta Lake 0.2.0 or above
- Hadoop's Azure Data Lake Storage Gen1 libraries for deployment with the following versions:
  - 2.9.1+ for Hadoop 2
  - 3.0.1+ for Hadoop 3
- Apache Spark associated with the corresponding Delta Lake version and [compiled with Hadoop version](https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn) that is compatible with the chosen Hadoop libraries.

#### Configuration (ADLS Gen1)

Here are the steps to configure Delta Lake on Azure Data Lake Storage Gen1.

1. Include `hadoop-azure-datalake` JAR in the classpath. See the requirements above for version details.

2. Set up Azure Data Lake Storage Gen1 credentials.

   You can set the following [Hadoop configurations](https://spark.apache.org/docs/latest/configuration.html#custom-hadoophive-configuration) with your credentials (in Scala):

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
    spark.conf.set("dfs.adls.oauth2.client.id", "<your-oauth2-client-id>")
    spark.conf.set("dfs.adls.oauth2.credential", "<your-oauth2-credential>")
    spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/<your-directory-id>/oauth2/token")
    ```
  </TabItem>
</Tabs>

#### Usage (ADLS Gen1)

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.range(5).write.format("delta").save("adl://<your-adls-account>.azuredatalakestore.net/<path-to-delta-table>")
    spark.read.format("delta").load("adl://<your-adls-account>.azuredatalakestore.net/<path-to-delta-table>").show()
    ```
  </TabItem>
</Tabs>

### Azure Data Lake Storage Gen2

#### Requirements (ADLS Gen2)

- A [service principal](https://docs.microsoft.com/azure/active-directory/develop/app-objects-and-service-principals) for OAuth 2.0 access or a [shared key](https://docs.microsoft.com/rest/api/latest/storageservices/authorize-with-shared-key)
- Delta Lake 0.2.0 or above
- Hadoop's Azure Data Lake Storage Gen2 libraries for deployment with the following versions:
  - 3.2.0+ for Hadoop 3
- Apache Spark associated with the corresponding Delta Lake version and [compiled with Hadoop version](https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn) that is compatible with the chosen Hadoop libraries.

#### Configuration (ADLS Gen2)

Here are the steps to configure Delta Lake on Azure Data Lake Storage Gen2.

1. Include `hadoop-azure` and `azure-storage` JARs in the classpath. See the requirements above for version details.

2. Set up credentials.

   You can use either OAuth 2.0 with service principal or shared key authentication:

   For OAuth 2.0 with service principal (recommended):

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
    spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
    spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
    spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", "<service-credential>")
    spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
    ```
  </TabItem>
</Tabs>

For shared key authentication:

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.conf.set("fs.azure.account.key.<storage-account>.dfs.core.windows.net", "<storage-account-access-key>")
    ```
  </TabItem>
</Tabs>

#### Usage (ADLS Gen2)

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.range(5).write.format("delta").save("abfss://<container-name>@<storage-account>.dfs.core.windows.net/<path-to-delta-table>")
    spark.read.format("delta").load("abfss://<container-name>@<storage-account>.dfs.core.windows.net/<path-to-delta-table>").show()
    ```
  </TabItem>
</Tabs>

## HDFS

Delta Lake has built-in support for HDFS with full transactional guarantees for concurrent reads and writes from multiple clusters. No additional configuration is required.

#### Usage (HDFS)

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.range(5).write.format("delta").save("hdfs://<namenode>:<port>/<path-to-delta-table>")
    spark.read.format("delta").load("hdfs://<namenode>:<port>/<path-to-delta-table>").show()
    ```
  </TabItem>
</Tabs>

## Google Cloud Storage

Delta Lake has built-in support for Google Cloud Storage (GCS) with full transactional guarantees for concurrent reads and writes from multiple clusters.

### Requirements (GCS)

- Google Cloud Storage credentials
- Delta Lake 0.2.0 or above
- Hadoop's [GCS connector](https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage) for the version of Hadoop that Apache Spark is compiled for

### Configuration (GCS)

1. Include the GCS connector JAR in the classpath.

2. Set up credentials using one of the following methods:
   - Use [Application Default Credentials](https://cloud.google.com/docs/authentication/application-default-credentials)
   - Configure service account credentials in Spark configuration

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.conf.set("google.cloud.auth.service.account.json.keyfile", "<path-to-json-key-file>")
    ```
  </TabItem>
</Tabs>

### Usage (GCS)

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.range(5).write.format("delta").save("gs://<bucket>/<path-to-delta-table>")
    spark.read.format("delta").load("gs://<bucket>/<path-to-delta-table>").show()
    ```
  </TabItem>
</Tabs>

## Oracle Cloud Infrastructure

Delta Lake supports Oracle Cloud Infrastructure (OCI) Object Storage with full transactional guarantees for concurrent reads and writes from multiple clusters.

### Requirements (OCI)

- OCI credentials
- Delta Lake 0.2.0 or above
- Hadoop's [OCI connector](https://docs.oracle.com/en-us/iaas/Content/api/latest/SDKDocs/hdfsconnector.htm) for the version of Hadoop that Apache Spark is compiled for

### Configuration (OCI)

1. Include the OCI connector JAR in the classpath.

2. Set up credentials in Spark configuration:

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.conf.set("fs.oci.client.auth.tenantId", "<tenant-ocid>")
    spark.conf.set("fs.oci.client.auth.userId", "<user-ocid>")
    spark.conf.set("fs.oci.client.auth.fingerprint", "<api-key-fingerprint>")
    spark.conf.set("fs.oci.client.auth.pemfilepath", "<path-to-private-key-file>")
    ```
  </TabItem>
</Tabs>

### Usage (OCI)

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.range(5).write.format("delta").save("oci://<bucket>@<namespace>/<path-to-delta-table>")
    spark.read.format("delta").load("oci://<bucket>@<namespace>/<path-to-delta-table>").show()
    ```
  </TabItem>
</Tabs>

## IBM Cloud Object Storage

Delta Lake supports IBM Cloud Object Storage with full transactional guarantees for concurrent reads and writes from multiple clusters.

### Requirements (IBM COS)

- IBM Cloud Object Storage credentials
- Delta Lake 0.2.0 or above
- Hadoop's [Stocator connector](https://github.com/CODAIT/stocator) for the version of Hadoop that Apache Spark is compiled for

### Configuration (IBM COS)

1. Include the Stocator connector JAR in the classpath.

2. Set up credentials in Spark configuration:

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.conf.set("fs.cos.service.endpoint", "<endpoint-url>")
    spark.conf.set("fs.cos.service.access.key", "<access-key>")
    spark.conf.set("fs.cos.service.secret.key", "<secret-key>")
    ```
  </TabItem>
</Tabs>

### Usage (IBM COS)

<Tabs>
  <TabItem label="Scala">
    ```scala
    spark.range(5).write.format("delta").save("cos://<bucket>.<service>/<path-to-delta-table>")
    spark.read.format("delta").load("cos://<bucket>.<service>/<path-to-delta-table>").show()
    ```
  </TabItem>
</Tabs>
